Genetic Connections and Convergent Evolution of Tropical Indigenous Peoples in Asia

Abstract Tropical indigenous peoples in Asia (TIA) attract much attention for their unique appearance, whereas their genetic history and adaptive evolution remain mysteries. We conducted a comprehensive study to characterize the genetic distinction and connection of broad geographical TIAs. Despite the diverse genetic makeup and large interarea genetic differentiation between the TIA groups, we identified a basal Asian ancestry (bASN) specifically shared by these populations. The bASN ancestry was relatively enriched in ancient Asian human genomes dated as early as ∼50,000 years before the present and diminished in more recent history. Notably, the bASN ancestry is unlikely to be derived from archaic hominins. Instead, we suggest it may be better modeled as a survived lineage of the initial peopling of Asia. Shared adaptations inherited from the ancient Asian ancestry were detected among the TIA groups (e.g., LIMS1 for hair morphology, and COL24A1 for bone formation), and they are enriched in neurological functions either at an identical locus (e.g., NKAIN3), or different loci in an identical gene (e.g., TENM4). The bASN ancestry could also have formed the substrate of the genetic architecture of the dark pigmentation observed in the TIA peoples. We hypothesize that phenotypic convergence of the dark pigmentation in TIAs could have resulted from parallel (e.g., DDB1/DAK) or genetic convergence driven by admixture (e.g., MTHFD1 and RAD18), new mutations (e.g., STK11), or notably purifying selection (e.g., MC1R). Our results provide new insights into the initial peopling of Asia and an advanced understanding of the phenotypic convergence of the TIA peoples.


Introduction
Asia is home to approximately 370 million self-identified indigenous peoples, comprising two-thirds of the total indigenous population in the world. These populations are culturally and physiologically diverse and have been largely shaped by their local environment and subsistence. These unique characteristics offer an incredible opportunity to study human evolution and have long been a topic of interest Article ß The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. and speculation. Archaeological records document a long history of modern human occupation in Asia, which probably occurred 50-70 ka along the coast of the Indian subcontinent to the Sundaland, subsequently reaching the Sahul Shelf (Bae et al. 2017). Several rainforest tribal populations scattered throughout the tropical areas of Asia and Oceania are thought to be descended from the original inhabitants of Asia, including the Melanesians, the Aboriginal Australians, and the Negritos currently found in the Andaman Islands, Peninsular Malaysia, and the Philippine Islands (Bae et al. 2017). This hypothesis was first proposed based on morphological features of these people and has received substantial attention in recent years due to mounting evidence from anthropologic and genetic studies (Oppenheimer 2009;Malaspinas et al. 2016;Mondal et al. 2016;Skoglund et al. 2016;Yelmen et al. 2019;Wang et al. 2021). The arrival of Austroasiatic and Austronesian farmers from the north during the Late Paleolithic and Neolithic era forced the early inhabitants into separate areas, which added to the complex demographic history of these peoples (Bellwood 2007;Jinam et al. 2012;Lipson et al. 2014). Many genetic studies have highlighted substantial distinction among geographically separated indigenous populations for their distinct genetic affinities and admixture patterns (Endicott et al. 2003;Reich et al. 2010Reich et al. , 2011Jinam et al. 2017;GenomeAsia100K Consortium 2019), but only a few have characterized the genomic footprints of their possibly shared evolution (Hoh et al. 2020).
One question that has long been debated is whether the distinctive physical features (e.g., small body size, dark skin, curly hair, and broad nose, collectively known as "Negritolike" phenotypes) shared by the tropical hunter-gatherers are ancestral or derived as a result of genetic convergence. Migliano et al. (2013) identified various signals of growthrelated positive selection in different tropical indigenous populations, whereas Bergey et al. (2018) reported shared natural selections on height-related pathways in African and Asian rainforest hunter-gatherers. Relatively more analyses were carried out in single tropical indigenous populations or geographically related indigenous groups. Novel adaptive signals of height, hair, nasal morphology, and those in response to some extreme environmental stresses have been reported in each of these populations or groups (Liu et al. 2015;Lopez et al. 2019;Zhang et al. 2021). However, the current understanding of the genetic basis of the dark skin phenotype is rather limited. Over the past few decades, tremendous advances have been made in our understanding of the evolutionary history of light skin in Europeans and East Asians, whereas only a few recent studies revealed the skin color variations and related loci in populations of African ancestry (Bonilla et al. 2005;Beleza et al. 2013;Crawford et al. 2017;Hernandez-Pacheco et al. 2017;Lloyd-Jones et al. 2017;Martin et al. 2017). Interestingly, some of the dark pigmentation-associated variants could have been under parallel evolution in different continents (Crawford et al. 2017). An alternative argument is that dark skin, as well as other "Negrito-like" phenotypes, are probably products of convergent adaptations, as evidenced by extensive genetic differentiation among these populations (Endicott et al. 2003;GenomeAsia100K Consortium 2019). Although exposed to intensive ultraviolet (UV) radiation all year round at similar latitudes, the indigenous populations are more darkly pigmented than their counterparts. This was evidenced by visual impressions and reported melanin measures (demonstrated in supplementary note S1, Supplementary Material online). The long history and simple modes of subsistence (e.g., foraging, pastoralism, and horticulture) of these populations would give special insights into the genetic mechanism of human pigmentation evolution.
Efforts have been made to investigate the full landscape of human genomic diversity in Asia (HUGO Pan-Asian SNP Consortium 2009;GenomeAsia100K Consortium 2019), and the importance of genetic data of minority groups in understanding human evolution and disease has been highlighted (Ben-Eghan et al. 2020). However, the lack of comprehensive geographical samples and full characterization of the genomic variants have limited our understanding of the genetic relationship and local adaptations of the tropical indigenous Asian (hereafter abbreviated as TIA) populations. In this study, we aim to explore the genetic connections of the TIA populations that share similar phenotypes, and further illustrate the putative genetic mechanisms underlying their phenotypic convergence. We compiled and analyzed genome-wide single-nucleotide polymorphism (SNP) data of 2,845 sequenced samples and 3,214 genotyped samples from global populations. These data provide especially high geographical coverage for the TIA populations that can be contextualized using very dense non-TIA samples, including tropical samples from Africa and the Americas, and a wide range of nonindigenous samples in Asia (supplementary fig. S1A-C, Supplementary Material online). The TIA populations were further split into two subcategories-one is the anthropologically identified Negrito with the above-mentioned outstanding aboriginal phenotypes, denoted herein as Negritolike TIA (NL-TIA); the other includes the non-Negrito-like TIA (nNL-TIA) populations. Detailed information on the samples and the groupings can be found in supplementary note S2, Supplementary Material online. We note that the Oceanic populations, although not located in the Asia continent, were also included in our analyses as a group of TIAs, considering that Southeast Asia and the Pacific islands are tightly related in modern human migration history (Gibbons 2001;Xu et al. 2012). We investigated the genetic relationship and ancestry makeup of the TIA populations, based on which we characterized the natural selections by focusing on the genetic architecture of previously identified pigmentation-related loci. We further explored the adaptive hypothesis of convergent evolution by evaluating these candidates, and finally developed a comprehensive model for the evolution of the dark skin phenotype in different geographical TIA populations.

Results
Genetic Affinity and Distinction of the TIA Populations Tropical indigenous populations are highly structured in genetics, consistent with their geographic relatedness ( fig. 1 and Deng et al. . doi:10. . 1A), and was enriched in genes involved in olfactory transduction (hsa04740; false discovery rate-corrected P¼0.2 Â 10 À4 by clusterProfiler; Yu et al. 2012).
Despite the fact that most of the global populations have multiple sources of genetic components, ADM, MLS, and the Oceanic populations contributed three additional ancestry components and showed minimal evidence of recent admixture ( fig. 1B). This result is consistent with the small effective population size and long identical-by-descent (IBD) tracts of these populations (supplementary fig. S3, Supplementary Material online). The ancestry makeup of these three populations was more homogeneous than the African rainforest hunter-gatherers, who are also dominated by a group-specific genetic component. However, PHI was characterized as admixed descendants of the Oceanic populations and the Austronesians ( fig. 1B). The nNL tropical indigenous The ancestry components were inferred based on autosomal SNPs, mtDNA, and Y-chromosome data. The autosome-based inference is provided by ADMIXTURE analysis at K ¼ 13. The mtDNA and Y haplogroups for some populations were obtained from the literature, whereas those for the others were estimated from the SNP-array or NGS data analyzed in this study (see supplementary table S1, Supplementary Material online, for more details). One portrait is shown for each group of tropical indigenous peoples, representing 1, San from southern Africa; 2, Batwa from central Africa; 3, Irula from mainland India; 4, Jarawa from the Andaman Islands; 5, Bateq from Peninsular Malaysia; 6, CheWong from Peninsular Malaysia; 7, Bajo from Indonesia 8, Aeta from the Philippines; 9, Papuan from Papua New Guinea; 10, Ticuna from Central America; 11, Quechua from South America. Photo sources: https://www.gettyimages.in/ (last accessed December 29, 2021) for 1, 2, 4, and 8-11; https://www.pinterest.com (last accessed December 29, 2021) for 3; https://www.vladivlad.com (last accessed December 29, 2021) for 7; Photos 5 and 6 were taken from the fieldwork. NL; Negrito-like; nNL, non-Negrito-like; AFR, Africa; SAS, South Asia; MSEA, mainland Southeast Asia; ISEA, island Southeast Asia; OCN, Oceania; EAS, East Asia; AMR, the Americas. Convergent Evolution of the Tropical Indigenous Peoples . doi:10.1093/molbev/msab361 MBE populations from South Asia, island Southeast Asia, and the Americas showed similar patterns of ancestry makeup with the nonindigenous populations in these three regions, respectively, and were less likely to be affected by the NL-TIAs.
Consistent with the autosomal genetic pattern, mitochondrial DNA (mtDNA) haplogroups were generally restricted in local regions. Populations with shared genetic ancestry may share the same mtDNA haplogroups ( fig. 1B and supplementary table S1, Supplementary Material online). For instance, haplogroup M21, which possibly represents the ancient maternal components in Southeast Asia (Macaulay et al. 2005;Hill et al. 2006Hill et al. , 2007, is the major haplogroup in NL-(65%) and nNL-TIAs (60%) of mainland Southeast Asia; Haplogroup P, which is commonly found in Oceania (Friedlaender et al. 2005), was identified in PNG (24%) and PHI (12%). The Ychromosome data confirmed the cross-continent barrier but showed some levels of interaction among populations within the continent ( fig. 1B and supplementary table S1, Supplementary Material online). For example, the haplogroup S, which was thought to be centered on the highlands of Papua New Guinea, was identified in several males of PHI and MLS. Although different subtypes of haplogroup S were assigned to each region (S1d and S1a in PNG, S3a in PHI, and unspecified types noted as S* in MLS), they could have been inherited from a common paternal lineage dated back to 48 ka. In addition, ADM shared the D1 haplogroups with Tibetans (supplementary table S1, Supplementary Material online), a group of highland indigenous peoples neighboring South Asians. These results collectively suggest sex-biased and ancestry-biased human migration and admixture history of the TIA populations.

A Shared Genetic Ancestry of the TIA Populations
The intrinsic genetic connections among the TIA populations indicated by the Y chromosome were further consolidated on the autosomes by the unsupervised ADMIXTURE analysis at K ¼ 5, where we found one basal component specifically prevailing in Asia, with leading proportions in the Oceanic populations, followed by some South and Southeast Asian populations ( fig. 2A and supplementary fig.  S4A, Supplementary Material online). Interestingly, this ancestry was substantially enriched in the TIA populations, especially in the NL-TIAs. This result was supported by eight of the ten replication analyses of ADMIXTURE with random seeds (supplementary fig. S4B, Supplementary Material online). The basal component is unlikely a result of random genetic drift, which may have large effects on the isolated populations but would casually shape them to different directions; nor is it likely driven by recent gene flow, as it is enriched in geographically discrete TIA populations that are largely hindered by geographical discontinuity.
We speculate this basal Asian (bASN) ancestry to be an ancient lineage, as evident by the following. First, the bASN ancestry is negatively correlated with the Altai Neanderthal introgression in Asian populations (r 2 ¼0.62 and P ¼ 1.26 Â 10 À6 for the mainland populations; r 2 ¼0.56 and P ¼ 0.005 for the island populations), suggesting that the peopling of Asia might have occurred prior to the Altai Neanderthal introgression (directly or indirectly) to the modern human populations in this region ( fig. 2B). Second, the bASN-specific alleles (represented by the Papuan-specific alleles) possibly originated from the ancient gene pool of the modern human ancestry, as they share a deep root with the Africans and archaic hominins on the maximumlikelihood tree ( fig. 2C), in contrast to those specific to the European and East Asian ancestries (represented by presentday Europeans and East Asians, respectively) (supplementary fig. S5, Supplementary Material online). Moreover, we recapitulated this shared ancestry in some ancient genomes yielded from Neolithic Southeast Asians when they were projected in the ADMIXTURE analysis ( fig. 2D). Intriguingly, the bASN ancestry is centered in the Indo-China Peninsula, and is positively correlated with time (r 2 ¼0. MixMapper ) fit all the tested TIA populations, including the Melanesians from Bougainville, PHI, Malaysian Orang Asli, and ADM, as combinations of two ancestry components: one closely related to PNG branch (representing the bASN ancestry); another split deeply from the East Asians (supplementary fig. S9A, Supplementary Material online). Intriguingly, the relative proportions are highly correlated with those inferred using ADMIXTURE (supplementary fig. S9B, Supplementary Material online). Admixture graphs constructed using AdmixtureBayes (Nielsen 2018) and qpGraph (Patterson et al. 2012) showed that PNG, ADM, and MLS received ancestries that split deeply with Europeans ( fig. 2F), and they contributed greatly to their surrounding nNL-TIAs and nonindigenous populations in Asia. For the South Asian groups, the Onge-related ancestor contributes 80% ancestry to Irula (nNL) and 48% ancestry to Gujarati (nonindigenous), with the remaining ancestry contributed by the European ancestor (

Shared Adaptations of the TIA Populations
Given the wide range of geographical TIA populations studied and the shared bASN ancestry identified, we intended to trace the footprints of this common ancestry and explore its possible functional effects on present-day TIA populations. We estimated the density of the bASN-affected SNPs and the Convergent Evolution of the Tropical Indigenous Peoples . doi:10.1093/molbev/msab361 MBE frequency of the bASN-derived haplotypes along the genome, and found some genomic regions were enriched with the bASN ancestry, and thus could be regarded as putative candidates of bASN-related adaptations (supplementary table S2, Supplementary Material online). We focused on the shared adaptations in at least two out of the four representative TIA populations, which implies possible parallel or convergent evolution (table 1). Although some other TIA populations also showed enriched bASN ancestry, the sample size for these populations was insufficient (<5 samples for each) for unbiased frequency estimation. One of the outstanding examples is a bASN-derived haplotype in LIMS1. It consists of 40 bASN alleles spanning the entire gene ($150 kb), all of which are derived alleles. The frequency of this haplotype is 0.97 in PNG, 0.8 in ADM, <0.3 in MLS and PHI, and is very rare in other worldwide populations. The high frequency and long stretch of this haplotype suggested that it could be adaptive in PNG and ADM (supplementary fig. S11, Supplementary Material online). LIMS1 encodes an adaptor protein that plays a role in integrin-mediated cell adhesion or spreading. It has also been reported to be hairiness-associated in East Asians (Endo et al. 2018). Downstream to LIMS1 is EDAR, a wellrecognized gene that is responsible for the facial and hair features of East Asians (Adhikari et al. 2016;Wu et al. 2016). The Asian prevalent allele at the key missense variant in EDAR, rs3827760-G, appears in lower frequency in the Southeast Asian indigenous populations (frequency¼0.25-0.44), but did not show signals of adaptation in any of these populations. We also identified a group of candidate haplotypes ($51 kb) in COL24A1, the frequency of which is higher in PNG (0.47), MLS (0.31), Vellalar (an nNL-TIA population in South Asia; 0.28), CheWong (an nNL-TIA population in Southeast Asia; 0.25), Tibetan (0.20), and African populations (0.25-0.42) than the rest of populations studied. COL24A1 encodes for collagen XXIV predominantly functioning in osteoblast differentiation and bone formation and is associated with some morphological traits in humans, such as waist-hip ratio and height (Lotta et al. 2018;Kichaev et al. 2019). More interestingly, a 56.9-kb adaptive haplotype in DDB1/DAK was identified in ADM (0.95), PHI (0.5), and African populations (0.52-0.83), and it included a key missense variant (rs2260655, c.553G>A, p. A185T) associated with the melanin index of Africans (Crawford et al. 2017). This observation supports the shared evolution of skin pigmentation between African and Asian populations proposed by Crawford et al. (2017), by complementing the hypothesis with more significant evidence via the understudied tropical indigenous populations in South and Southeast Asia. Other shared signatures of selection in several TIA populations include LRBA aiding the secretion and membrane deposition of immune effector molecules, NRXN3 encoding neurexins that function in the vertebrate nervous system as cell adhesion molecules and receptors, and several genes involved in the metabolic pathways (e.g., ACBD6, FDXR, INPP4B, and SH3PXD2A) (table 1).
We alternatively employed a de novo screening strategy to search for more shared selective sweeps, by estimating a composite statistic called F CS based on the integrated haplotype score (iHS) and locus-specific branch length (LSBL) (see Materials and Methods). Remarkably, we identified two genomic regions that exhibited sweep signals in all four populations (supplementary fig. S12A, Supplementary Material online). One of these regions lies in NKAIN3, which encodes a member of the Na þ /K þ -ATPase b-subunit interacting protein family that is involved in the regulation of the Na þ transport (supplementary fig. S12B, Supplementary Material online). This gene is specifically expressed in the brain and nerves according to the Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/home/, last accessed December 29, 2021) and is associated with certain neurological diseases. Notably, it has been reported to be heightassociated in the Europeans (Tachmazidou et al. 2017). The other signal is in C16orf97, an lncRNA gene. GTEx shows that this gene expresses significantly in testis as opposed to other organs. We also found that several genes showed different adaptive loci among the four populations. One example is TENM4, which encodes the teneurin transmembrane protein 4 that plays a role in establishing proper neuronal connectivity during development (supplementary fig. S12C, Supplementary Material online). Adaptive genes shared by at least two of the four populations were significantly enriched in neurological functions (supplementary fig. S12D, Supplementary Material online). There was also evidence for shared adaptation on the Negrito-like phenotypes, attributing to identical candidate genes across TIA populations or different genes for similar biological processes (table 2). Notably, there were a larger number of candidate genes shared across the Wallace-Huxley's line (between ADM/MLS and PHI/ PNG), than those identified specifically in population grouping located at either side (within ADM/MLS or PHI/PNG), suggesting that those shared signals are unlikely to be consequences of recent gene flow between populations. These results indicate complex genetic mechanisms underlying the phenotypic convergence of TIA populations.

Evolution Scenarios of Pigmentation-related genes in the TIAs
To characterize the genetic architecture of skin pigmentation primarily in the four representative TIA populations (again, some populations with enriched bASN ancestry and even darker skin pigmentation, such as Melanesians from Bougainville, were excluded due to limited samples size), we compiled a comprehensive list of 1,057 pigmentation genes in human from previous genetic studies or biological pathways (supplementary table S3, Supplementary Material online, see Materials and Methods). At 103 of these variants that affect 85 genes (supplementary table S4 and figs. S13-S15, Supplementary Material online), the risk alleles were reported in previous association studies and were further determined as dark pigmentation alleles (DPAs) reported in the Africans or light pigmentation alleles (LPAs) reported in the Eurasians (see Materials and Methods). Since most of these risk alleles were identified under the additive model in association studies, we assumed that the two alleles at a candidate SNP have opposite effects on human coloration. Approximately 88.8% of the pigmentary variants are standing variants in the Africans, and 11.2% or less (some DPAs possibly drifted out    S16B and C, Supplementary Material online), thereby confirming the genetic basis for the dark skin phenotype. We hypothesized that the dark pigmentation of the TIAs may have originated from the bASN ancestry, as we observed that the DPA-based unweighted polygenic risk score (PRS)number of DPAs in each individual-is positively correlated with the bASN ancestry in mainland Southeast Asia (r 2 ¼0.23 and P ¼ 2.25 Â 10 À5 across individuals; r 2 ¼0.77 and P ¼ 0.006 across populations) and South Asia (r 2 ¼0.51 and P ¼ 3.28 Â 10 À10 across individuals; r 2 ¼0.59 and P ¼ 0.016 across populations) ( fig. 3 and supplementary fig. S17, Supplementary Material online). We further analyzed the flanking haplotypes of the key DPAs reported in Africans, and found that most of these are IBD in the African and non-African populations and are likely to be the ancestral states of the modern human, implying possible parallel evolution. Notably, the derived allele (G) at rs872257, an intronic variant of the antisense gene RP11-125B21.2, had an excess of frequency in all the African populations, NL-TIAs, and PNG (allele frequency [AF]¼0.5-0.94) over the other Eurasians (AF¼0.20-0.45), and two DPAs (rs6510760-A and rs112332856-C) in MFSD12 were enriched in the TIA populations in each geographic region. However, in most of the cases, the DPAs were lost in several of the TIA populations during evolution. For instance, we observed the accumulation of three DDB1-DAK DPAs (rs11230664-C, rs376397-G, and rs2513329-T) in ADM and PHI (0.95 in ADM; 0.5 in PHI; 0.17-0.89 in the Africans; 0-0.23 in other Eurasians), which are in strong linkage disequilibrium (LD) with the aforementioned missense mutation at rs2260655 (fig. 4). Although selective sweep at this site was only detected in PHI, we suspect that these DPAs are also beneficial in ADM as these show . This haplotype is constituted of 60 alleles, including 42 putative DPAs, and is likely an ancient haplotype harbored by the archaic hominins and some African individuals. A similar example is a 21.5-kb haplotype carrying a highly conserved regulatory variant (rs113432620-G; an MLS-specific allele, AF¼0.25) and other 13 putative DPAs at RP11-6N13.1. The allele age at rs113432620-G was estimated to be older than 400 ka. This result, again, suggests that dark skin in the TIAs could be one of the most important phenotypes inherited from African ancestry. However, the adaptation of inherited DPAs mingled with that of LPAs in the skin-color determination of TIAs. We identified adaptation at the OCA2 loci in the TIA populations and found that the adaptive alleles matched the Asian prevalent alleles, which could be a consequence of recent admixture (supplementary fig. S16D, Supplementary Material online). In addition, gene flow plays an essential role in facilitating the convergent evolution of dark pigmentation. For instance, the only putative DPA shared by the four representative TIA populations, MTHFD1-rs138662720-T, is related to different haplotypes between Africans and TIAs. The haplotypes in TIAs are similar to the Altai Neanderthal sequences and were inferred to be introgressed haplotypes by ArchaicSeeker 2.0 (Yuan et al. 2021). They were clustered on an independent basal branch in the haplotype network, whereas those in Africans were on the shared branches with other Eurasians (fig. 4). MTHFD1 plays a role in the folate-mediated 1-carbon transfer process, which is influenced by the environmental UV levels and possibly evolves in parallel with skin pigmentation (Jones et al. 2018). Interestingly, rs138662720 is located in the promoter and enhancer elements of MTHFD1 with signals of H3K4me3þ and H3K27Acþ in melanocyte, and thus may affect MTHFD1 expression (supplementary fig. S18, Supplementary Material online). Archaic introgression also facilitates the convergent evolution at different sites of the same gene across the TIA populations. One notable example is RAD18, which is recruited to replication forks and facilitates the DNA repair after UV exposure (Liu et al. 2012). MLS and PNG showed differentiated signals of positive selection at this gene, and the adaptive haplotype in MLS is likely an introgressed haplotype from Altai Neanderthal ( fig. 5A and  B). This haplotype reaches a frequency of 0.23 in MLS, but is absent in other worldwide populations. We also observed  . 4). The allele age was estimated to be 5.0 and 8.8 ka in the two populations, respectively. Although identical mutations across human populations more likely reflect an ancestral event, we could not rule out the possibility of independent de novo mutations in this case. This locus is around the enhancer and promoter region of STK11 with high DNase sensitivity (supplementary fig. S19, Supplementary Material online). Therefore, we speculated any mutations at this locus could have altered the pattern of gene expression thence phenotypic consequences.

Genetic Convergence of Pigmentation on the Gene Level
Although identical adaptive variants were identified across the TIA populations, considerable variations showed how different a pigmentation gene functions in response to the selective pressure on skin color in different populations. Except for the aforementioned example of RAD18, several examples of convergent evolution at the gene level were discovered. Notably, NRG1 involved in melanocyte development (Buac et al. 2009) showed distinct signals in ADM and MLS ( fig. 5C); different CEP112 signal loci were identified in PHI and PNG ( fig. 5D), and this gene is associated with iris color in South Asians (Jonnalagadda et al. 2019).
Previous studies have suggested that the dark skin phenotype in Africans could be a result of strong functional constraints on pigmentary genes, whereas relaxation of the constraints in the Eurasians leads to the light skin phenotype in these populations. MC1R is an exemplar gene under purifying selection in the Africans (Harding et al. 2000; Sturm et al.   Makova and Norton 2005). In our data, we found a significant excess of mutation burden on the collective pigmentation-related genes compared with the whole genome in most Eurasian populations, except for some TIAs such as PNG and ADM (supplementary fig. S21A, Supplementary Material online). This observation may be largely attributed to the accumulation of functional mutations in the Eurasian populations, as most of the pigmentation-related genes were reported in these populations, and to a lesser extent, constraint relaxation could have occurred in some genes in Eurasians. MC1R showed signs of purifying selection in PNG and ADM, including the low mutation burden and small nucleotide diversity in the coding sequences of this gene that are comparable to those in Africans (supplementary fig. S21B and C, Supplementary Material online). PNG and ADM also showed a close genetic affinity with Africans on this gene (supplementary fig. S21D, Supplementary Material online). Moreover, the segregating sites in the four representative TIA populations were highly consistent ( fig. 5E). These polymorphisms were also observed in the other Eurasian populations but did not overlap with those shown in Africans. These results suggest that purifying selection of MC1R could be a consequence of convergent evolution in the African and Asian indigenous populations.

Discussion
In this study, we reveal that different geographical TIA populations are genetically distinct from each other, but also provide compelling evidence for the shared ancient Asian ancestry of TIAs. Our analysis showed no supporting evidence that the identified bASN ancestry resulted from shared evolution of archaic introgression, recent admixture, or other random effects. Rather, it is more likely a sign of an independent pulse of human migration to Asia, involving the ancestors of the present-day Negritos and Oceanic populations. Previous genetic studies focusing on ancient genomes or single present-day indigenous populations have suggested an ancient initial occupation of Southeast Asia, followed by several layers of migrations from different ancestry sources and different places (Rasmussen et al. 2011;Xu et al. 2012;Lipson et al. 2018;McColl et al. 2018). The whole-genome sequencing data of diverse TIA populations we analyzed poses an additional advantage on tracing the ancestry footprints in the present-day human, and connecting the genomes with traits.
Taking advantage of the whole-genome sequences that we generated for the Malaysian Orang Asli samples and those collected from other published studies, we present the first genome-wide survey of local adaptations in multiple geographically dispersed groups of TIA populations and show that these populations have adapted to the challenging habitats of the tropical rainforest via both shared and distinct genetic determinants. For instance, we found evidence of strong sweeps shared by the four representative TIA groups, targeting a hair-associated gene LIMS1, and a height-related gene COL24A1. Evidence for the shared adaptations of the Negrito-like phenotypes across TIA populations was elaborated on the gene level and the pathway level. Meanwhile, plenty of differentiated adaptations were reported by previous studies and also observed in our data.
An inevitable issue regarding the studies on TIA populations is the genetic mechanism of the dark skin color phenotype. We explored the dark pigmentation evolution primarily based on the four representative TIA populations for several advantages, including outstanding dark skin phenotype evidenced by the literature, long population history supported by the enriched bASN ancestry, and less genetic interaction with the surrounding nonindigenous populations compared with other nNL-TIAs. This study provides a full landscape of the pigmentary variants and genes in TIA populations. Despite the lack of a comprehensive assessment of the skin color phenotype in the samples studied, we referred to the reported data in previous studies, and found that the TIA populations we focused on here have a higher melanin index than other populations inhabiting similar latitudes (supplementary note S1, Supplementary Material online). Therefore, it is conceivable to summarize the overall pattern of skin color evolution based on the genetic architecture of pigmentary variants in populations qualitatively differentiated in skin color. The priori pigmentary variants were obtained from the GWAS studies primarily carried out in the major continental groups, as very few results were reported in other minority populations. Kenny et al. (2012) reported a missense variant in TYRP1 that is associated with blond hair in Melanesians, but this variant was not captured in the sequences we analyzed.
Five conclusions can be drawn from the present analyses. First, the dark skin phenotype is likely characterized by extreme genetic heterogeneity, as a higher genetic load has been observed in the indigenous populations than in the nonindigenous populations in each geographic area. Second, a large number of the DPAs are likely to be IBD among TIA populations and were probably inherited from the modern human ancestry. These results support that the early dispersal of the dark-skinned people out of Africa was not accompanied by significant changes in skin color (Jablonski 2018), and this is consistent with an individual case observed by Crawford et al. (2017). Third, we observed some putative DPAs are likely de novo mutations in TIAs, or adapted from the archaic introgression, suggesting the possibility of convergent evolution. Previous studies have suggested convergent evolution of light skin in Europeans and East Asians and that of the dark skin among African groups (Norton et al. 2007;Martin et al. 2017), and our analyses complement the hypothesis by providing additional supporting evidence. Fourth, the convergent purifying selection also drives pigmentation in the tropical indigenous populations, such as MC1R in the Africans, PNG, and ADM. A previous study did not support the strong selective constraint on MC1R in the Melanesians (Norton et al. 2015), possibly due to the genetic differentiation between near and remote Oceanic populations. Finally, some of the LPAs are also adaptive in TIA groups, which could evolve from standing variants or gene flows between populations, again suggesting the complexity of the skin color phenotype. Based on our data, we conclude that the phenotypic convergence in skin color across the geographically diverse TIA populations is Deng et al. . doi:10.1093/molbev/msab361 MBE likely a joint consequence of shared genetic ancestry and convergent molecular adaptation.
Our results supported the hypothesis that the darkpigmented skin evolved in early hominins in Africa, and that formed the substrate of the skin color of the ancient Asian ancestry. Populations dispersing and living along the coasts of southern and southeastern Asia in the Late Pleistocene experienced strong and seasonally relatively invariant UV levels, and that favored the maintenance of dark pigmentation, probably involving selection on ancestral gene variants and recent mutations (Jablonski 2021). Evidenced by the adaptive introgressed haplotypes in MTHFD1 and RAD18, we established the genetic contribution of Neanderthals to the skin pigmentation in TIA populations, which was not recognized before despite that such connection was only identified in the light-skinned modern Eurasians (Dannemann and Kelso 2017;Rotival and Quintana-Murci 2020). Human skin pigmentation is jointly determined by different combinations of genetic and cultural conditions over time and through space (Jablonski 2021). Our results showed that UV-induced folate degradation could be related to the dark skin phenotype in TIA populations. MTHFD1 plays a role in the development of neural tube defects-a congenital disorder due to folic acid deficiency in pregnant mothers. Therefore, adaptation at MTHFD1 may help to improve reproductive success. Notably, folate-dependent reproduction is not necessarily a driving force of human skin color adaptation to UV. MTHFD1 and other folate-related genes could have evolved in parallel with the causal variants determining pigmentation (Jones et al. 2018). Further studies are needed to investigate the interaction of folate, skin pigmentation, and UV conditions, and to figure out how the identified MTHFD1 variant affects skin pigmentation. In addition to UV, selective pressures on the immune function could also be a reason, as eumelanin is important in enhancing the barrier function of the skin. The tropical indigenous populations have been facing extreme epidermal environments, and some of the adaptive genes (e.g., LRBA) we identified are involved in immune processes. It should be noted that the effects of material culture and traditions must also be considered, even if the strength of these influences cannot be easily quantified. Some TIA populations usually wear few clothes and their diets were rich in fish (Bolton 1972;Sreenathan et al. 2008;Headland 2015), and these may have contributed to the selective constraint on pigmentation. It was also proved that dark-pigmented people can achieve sufficient vitamin D with adequate duration of sun exposure (Nimitphong and Holick 2013).
Despite the lack of experimental evidence for direct connection between the shared adaptive alleles and the shared phenotypes across TIA populations, those specific patterns of population allele sharing and reported genotype-phenotype associations could help to establish such connections. We caution that the model provided in the present study is greatly simplified. Since human skin color is a highly complex polygenic trait, we would not be able to quantify the effects of parallel and convergent adaptation without variant functional validations. However, our efforts on demonstrating the molecular evolutionary pattern enable the life science communities to better understand the human evolutionary process via skin color adaptation, and provide insights into the nature of parallel or convergent evolution. With in-depth genetic and phenotypic studies, as well as functional assessments of the putatively selected variants, it is therefore plausible to characterize genetic mechanisms of human skin color adaptation. All the samples were sequenced at 30Â coverage for 150bp paired-end reads, using Illumina Hiseq X10 version 2.5 by Wuxi NextCODE in Shanghai. The raw reads were aligned to the reference GRCh37 human genome using BWA (Li and Durbin 2010). The low mapping quality reads (MAPQ<20) and potential duplicates were removed using SAMtools (Li et al. 2009) and Picard toolkit, respectively. We then realigned the indels using IndelRealigner from GATK (McKenna et al. 2010), and used the GATK variant quality score recalibration to adjust the base quality score bias from the sequencer. Variant calling was performed using HaplotypeCaller (McKenna et al. 2010;DePristo et al. 2011). After filtering out the low-quality variants, we finally obtained 51,722,790 biallelic SNPs for further analyses. All of these samples were self-reported to be unrelated, but we identified first-, second-, and third-degree relatedness involving 21 of them, as suggested by KING version 2.1.2 (Manichaikul et al. 2010). Considering that these indigenous populations are highly inbred, we only removed four individuals showing the first-degree relatedness and combined the rest genomes with public data sets. A detailed description of samples and data can be found in supplementary note S2, Supplementary Material online.  Consortium 2015); 2) the Simons Genome Diversity Project (Mallick et al. 2016); 3) the Estonian Biocentre Human Genome Diversity Panel (Pagani et al. 2016); 4) highcoverage genomes of indigenous populations from South and East Asia (Lu et al. , 2017Mondal et al. 2016); 5) high-density SNP-array data of worldwide populations (Reich et al. 2011;Henn et al. 2012;Schlebusch et al. 2012;Petersen et al. 2013;Pugach et al. 2013;Deng et al. 2014;Ko et al. 2014;Lazaridis et al. 2014;Patin et al. 2014;Aghakhanian et al. 2015;Liu et al. 2015). We generated two combined data sets. One was compiled from all the sequencing and SNP-array data, including 49,545 autosomal SNPs of 6,059 samples, which was used for inferring the general genetic relationship and ancestry makeup of populations; The second was restricted to the genome sequences, including 17,242,884 autosomal SNPs of 2,845 samples, used for the investigation of genome variations. Archaic and ancient data used in this study include: 1) high-coverage genomes of the archaic hominins, including Altai Neanderthal (Prüfer et al. 2014) and Denisovan (Reich et al. 2010); 2) worldwide Neolithicto-Paleolithic ancient human genomes (Allen Ancient DNA Resource, https://reich.hms.harvard.edu/allen-ancient-dnaresource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data, last accessed December 29, 2021, version 44.3).

Sample Collection and Data Processing
Variants Annotation, Genotype Imputation, and Phasing Functional consequences and the pathogenicity of the variants were assessed using the Variants Effect Predictor (McLaren et al. 2016). The ancestral allele at each variant was determined according to the ancestral sequences (e71) released by the 1000 Genomes Project. Variants were mapped to genes according to Ensembl 96 (GRCh37). Functional enrichment for genes was obtained using the annotation tool clusterProfiler v3.10.1 (Yu et al. 2012) and ToppGene (https:// toppgene.cchmc.org/enrichment.jsp, last accessed December 29, 2021) (Chen et al. 2009). Haplotype phasing and imputation were carried out using Shapeit2 (Delaneau et al. 2011) (supplementary note S2, Supplementary Material online).

Collection of the Pigmentation-Related Pathways, Genes, and Variants
We focused on the evolution patterns of some genes of particular interest, including genes involved in pigmentation-related pathways and ontologies defined by KEGG and Gene Ontology (provided by clusterProfiler v3.10.1; Yu et al. 2012), and genes previously reported to be associated with pigmentation (including iris, hair, and skin color with a shared genetic basis to some extent; Rees 2003;Sulem et al. 2007)

PCA and Inference of the Ancestry Components
We carried out approximate LD-pruning of SNPs by selecting a SNP in each of the 200-kb nonoverlapping windows, and then performed flashPCA (Abraham and Inouye 2014) and ADMIXTURE version 1.3.0 (Alexander et al. 2009) with all the options set to default. We specified K ¼ 2-26 in the ADMIXTURE analysis and performed ten replications for each K using random seeds. The cross-validation error was measured to assess the number of clusters with the best fit for each analysis.

Population Genetic Differentiation (F ST )
F ST was assessed following Weir and Cockerham (1984) to evaluate the overall population genetic distance and to identify local regions that are highly differentiated among the representative TIA populations. F ST was calculated for every single SNP, and was then weighted over the whole genome to construct the population phylogeny shown in figure 1A. We also estimated the four-population F ST for the four representative TIA populations, based on a 10-kb sliding window along the chromosome, stepping by 1 kb. The four-population F ST value for each gene was represented by that of the top window overlapping this gene. We normalized the F ST of all genes by square roots transformations, and genes showing normalized F ST >1.96 were selected as outliers.

Mitochondrial DNA and Y-Chromosome Analysis
The mtDNA haplogroups were identified using HaploGrep2 (Weissensteiner et al. 2016) based on PhyloTree 17 (van Oven 2015). The Y chromosome haplogroups were determined based on known Y-SNP markers from the International Society of Genetic Genealogy (ISOGG) phylogenetic tree 2019-2020 using in-house scripts. The Filippino mtDNA haplogroups were obtained from 75 individuals from Delfin et al. (2014). We filtered out individuals with a lower resolution of haplogroup classification, including the individuals with an overall rank <0.6 (provided by HaploGrep2; Weissensteiner et al. 2016) for the mtDNA, and those with estimated Y haplogroups staying on the ISOGG Y-DNA phylogenetic tree trunk. We performed BEAST v.2.6.0 (Bouckaert et al. 2014) to estimate the coalescent time of haplogroups. The age of mtDNA haplogroup MN (77,000 years, 95% CI¼61,400-93,200) (Fu et al. 2013) and NRY haplogroup CT-M168 (71,760 years, 95% CI¼69,777-73,799) (Karmin et al. 2015) were used for calibration of age estimation of mtDNA and NRY haplogroups, respectively.

Identification of the bASN-Derived Alleles
We inferred the bASN-derived alleles from the wholegenome sequences of ADM, MLS, PHI, and PNG, following Watanabe and Ohashi (2020). We first identified alleles with frequency>0.2 in any of the above four populations but <0.1 in European and East Asian populations. Considering possible gene flow between TIA and the surrounding non-TIA populations in South and Southeast Asia, we did not exclude those alleles carried by South and Southeast Asians. Next, LD coefficients (r 2 ) were calculated between SNPs located within 500 kb from each other using PLINK v1.90 (Purcell et al. 2007). Alleles with r 2 >0.5 were counted in as bASN-derived alleles.

Estimating the GS of bASN-Derived Alleles
We estimated the unweighted GS of the bASN-derived alleles in each ancient sample as follows: where m is the number of bASN-affected SNPs genotyped in the individuals and d i is the presence of the bASN-derived alleles at SNP. The 14,353 bASN-derived alleles were assigned into 1,551 blocks using a threshold of between-SNP-distance of 500 kb. We then calculated the weighted GS for each sample as: where N is the number of blocks, n j is the number of bASNderived SNPs in the block j, d ij is the presence of the bASNderived alleles at SNP i of block j. We used linear regression to examine the association between the GS and time for samples. Samples with a total number of genotyped SNPs less than the first quantile of all the ancient samples were excluded from this analysis. Geographic coordinates of the samples, that is, latitude and longitude, and the first ten PCs were included as covariates in the model:

Modeling of Admixture Graphs
We first applied AdmixtureBayes (Nielsen 2018) to estimate the top posterior admixture graphs, based on the covariance matrix of the AF profiles. Only 148,416 variants with AF in 0.05-0.95 in any of the populations used in our study and those with between-SNP-distance >10 kb were used in this analysis. For each AdmixtureBayes run, a total of 1,000,000 MCMC steps were carried out, with other parameters set to default. Multiple sets of populations were analyzed to obtain a full picture of the admixture graphs, with each graph inferred from an independent data set. The chimpanzee was used as the outgroup. We finally input the graphs estimated from AdmixtureBayes to qpGraph (Patterson et al. 2012) to test the goodness of fit. We accept only the graphs with an absolute value of the Z-score of the worst f 4 -statistic <3. The parameters of qpGraph include: outpop¼NULL, blgsize¼0.05, lsqmode¼YES, and diag¼0.0001.

Constructing the Haplotype Networks
Given the frequency (f) of an allele of interest, we identified the related haplotypes for this allele using the following procedures. First, we screened for the AF block involving this allele in each by 1) screening for the alleles at frequency f in the nonoverlapping 50-kb windows, 2) selecting out the windows with at least five alleles at frequency f, and 3) merging the adjacent windows and selecting the one with the target variant included. Second, we collected the union of the AF blocks identified in each population. Then, the haplotype networks were constructed and visualized at genomic regions of interest using NETWORK 10.1.0.0. To construct the haplotype networks, we selected 50% of the samples in populations with sample size >50, and make sure that the samples carrying the target alleles are included. We first calculated the median-joining network (Bandelt et al. 1999), and then removed the superfluous nodes using the maximumparsimony algorithm (Polzin and Daneshmand 2003).

Detection of Natural Selection
We calculated iHS using Selscan v1.2.0 (Szpiech and Hernandez 2014) for polymorphic sites in each population, and LSBL for whole-genome variants using in-house scripts.
For each SNP, we combined iHS and LSBL scores into a Fisher's score (F CS ) following Lopez et al. (2019): F cs ¼ Àlog 10 p iHS ð ÞþÀlog 10 p LSBL ð Þ; where p is the empirical p value across the genome. When calculating the LSBL statistic, we chose Han Chinese (CHB) as an outgroup, and a neighbor population with the closest genetic relationship (measured by F ST ) as a reference for each representative TIA population: BEB for ADM, Burmese for MLS, and Dusun for PHI and PNG. An outlier SNP was defined as an SNP with an F CS among the 5% highest of the genome. A putatively selected genomic region was defined as a 100-kb window presenting a proportion of outlier SNPs among the 5% highest of all windows, in four bins of SNP numbers. Windows containing less than ten SNPs were discarded.

Estimating the Allele Age
Allele ages were estimated using the Genealogical Estimation of Variant Age (GEVA) (v1 beta) based on the mutation clock model (Albers and McVean 2020). The initial effective population size and mutation rate were set to be 10 4 and 1.25 Â 10 À8 , respectively, as recommended in the manual.

Estimating Mutation Load
Variants were grouped into three categories according to the genomic evolutionary rate profiling (GERP) score, corresponding to different deleterious levels (2-4, mild; 4-6, moderate; >6, critical). We assigned a selection coefficient (s) for variants in each category following Boyko et al. (2008): s¼À4.5 Â 10 À4 for GERP in 2-4, s¼À4.5 Â 10 À3 for GERP in 4-6, and s¼À1 Â 10 À2 for GERP >6. The total mutation Convergent Evolution of the Tropical Indigenous Peoples . doi:10.1093/molbev/msab361 MBE load (L) is the sum of load for each locus (Kimura et al. 1963) under an additive model: where f is the derived AF, and h ¼ 0.5. We estimated L for the whole genome, for a set of genes (e.g., pigmentary genes), and the coding sequence region of MC1R in this study. To make it comparable across genes or gene sets, we averaged L by SNPs.

Consent for Publication
Not applicable.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.